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AN ITERATIVE PROCEDURE FOR GENERAL PROBABILITY 
MEASURES TO OBTAIN I-PROJECTIONS ONTO 
INTERSECTIONS OF CONVEX SETS 

By Bhaskar Bhattacharya 

Southern Illinois University 

The iterative proportional fitting procedure (IPFP) was intro- 
duced formally by Deming and Stephan in 1940. For bivariate densi- 
ties, this procedure has been investigated by Kullback and Riischendorf. 
It is well known that the IPFP is a sequence of successive I-projections 
onto sets of probability measures with fixed marginals. However, 
when finding the I-projection onto the intersection of arbitrary closed, 
convex sets (e.g., marginal stochastic orders), a sequence of succes- 
sive I-projections onto these sets may not lead to the actual solution. 
Addressing this situation, we present a new iterative I-projection al- 
gorithm. Under reasonable assumptions and using tools from Fenchel 
duality, convergence of this algorithm to the true solution is shown. 
The cases of infinite dimensional IPFP and marginal stochastic orders 
are worked out in this context. 

1. Introduction. For two probability measures (PM) P and Q denned 
on an arbitrary measurable space (X,B), the I-divergence or the Kullback- 
Leibler distance between P and Q is defined as 



i(P\Q) 



J \n(dP/dQ)dP, if P<Q, 
+00, otherwise. 



If R is any PM with P <C R,Q <C R, then I(P\Q) can equivalently be ex- 
pressed as I{P\Q) = J(dP/dR)ln((dP/dR)/(dQ/dR))dR. Here and in the 
sequel we observe the conventions that InO = —00, ln(a/0) = +00, • (±00) = 
0. 

Although I(P\Q) is not a metric, it is always nonnegative and equals 
if and only if P = Q (a.e.). Hence, it is often interpreted as a measure 
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of "divergence" or "distance" between P and Q. Other popular names of 
I(P\Q) are information for discrimination, cross-entropy, information gain, 
and so on. 

For a given Q and a specified set of PM's C, it is often of interest to find 
the RgC which satisfies 



This R is called the I-projection of Q onto C. So, the I-projection, when 
defined, corresponds to a finite I-divergence. Csiszar [6] has shown that R 
exists uniquely if C is variation-closed and there exists P 6 C such that 
I(P\Q) < oo. Csiszar [6] also gives a characterization of R as follows: R is 
the I-projection of Q onto the convex set C if and only if 



for every P E C (equality holds if R is an algebraic inner point of C). 

If Q is a finite measure but not a PM, with c = J dQ and Q' = Q/c, the 
I-divergence between a PM P and Q is given by I(P\Q) = I(P\Q') — Inc. 
It can be seen easily that the same characterization result in (1.2) holds, that 
is, Q and Q' would lead to the same I-projection onto C although I(P\Q) 
need not be nonnegative if Q is not a PM. 

I-projections play a key role in the information theoretic approach to 
statistics [12, 19]. The areas of maximization of entropy [16, 23] and the the- 
ory of large deviations [27] also use this concept. The iterative proportional 
fitting procedure (IPFP) [9, 18] commonly used in contingency tables is ac- 
tually an iterative algorithm of successive I-projection problems [6, 7, 8, 15]. 
Maximum likelihood estimation in log-linear models for multinomial distri- 
butions is equivalent to solving an I-projection problem [11, 13, 14]. 

Depending on the form of the set C, it may be difficult to find a solu- 
tion to the I-projection problem in (1.1). In the discrete case, if C can be 
expressed as f)^ =1 Cj where each Ci is a closed, linear set [i.e., P\,P2 £Ci^- 
olP\ + (1 - a)P 2 G Ci, for all a for which olP\ + (1 - a)P 2 is a PM], then 
Csiszar [6] has shown that the sequence of cyclic iterated I-projections onto 
individual Ci converges to the solution of (1.1). Dykstra [10] modified this 
procedure to work for the case when Ci are arbitrary closed, convex sets 
subject to a limiting condition which was later removed by Winkler [28]. 
Later Bhattacharya and Dykstra [3] interpreted Dykstra's procedure in the 
context of Fenchel duality. 

In the infinite-dimensional bivariate case, the IPFP is fitting (adjusting) 
a PM to two given marginals. Kullback [20] considered this problem, and 



(1.1) 




I(P\Q)>I(P\R)+I(R\Q), or 



(1.2) 
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Riischendorf [25] proved its convergence. When finding the I-projection of a 
general PM Q on a convex set C, Bhattacharya and Dykstra [2] used Fenchel 
duality to identify an equivalent dual problem which might be easier to solve 
(depending on C) than the I-projection problem at hand. They worked out 
several examples to demonstrate the utility of the duality approach. 

In this paper we consider the infinite-dimensional case when the con- 
straint set can be expressed as an intersection of a finite number of arbitrary 
variation-closed, convex sets. The motivation for this paper comes from the 
fact that successive iterative I-projections, as considered by Riischendorf 
[25], onto these sets may not lead to the actual solution (see Example 3.1 
for an analytical example). The rest of the paper is organized as follows. 
In Section 2 we introduce the necessary notation and preliminaries to the 
problem. An iterative algorithm, a modified version of the discrete case 
[3, 10], is presented in Section 3 which is shown to converge to the correct 
solution. In dual formulation, a cyclic descent algorithm is obtained which 
amounts to minimizing over one variable at a time while all the others are 
held fixed. We establish the correspondence between the primal and dual 
solutions at every step of the two algorithms, which aids to prove the con- 
vergence of I-projections. In Section 4 we consider the case when marginal 
PM's are stochastically greater than or equal to the given PM's. The infinite- 
dimensional IPFP follows when each C, is defined as a linear set of fixed 
marginals. The duality approach used in this paper to solve this problem 
seems simpler and more intuitive. This generalizes easily to the case when 
more than two variables are involved. 



2. Preliminaries. We begin with some necessary notation. Let the un- 
derlying probability space be denoted by (Q,F, Q), where Q is the sample 
space, T is the a- field of subsets of Q, and Q is a given PM defined on 
elements of J 7 . We will use the notation / f dQ to indicate Jq f(u>) dQ(u>). 

We will work with the normed, linear vector space L\(Q) since I(P\Q) < 
oo implies dP/dQ € L\(Q). Consider the function / defined as 

(2i) = / J xlnxdQ, if x > 0, J xdQ = 1, 

I +oo, otherwise, 

and let 

(2.2) C = {x € L 1 (Q):x = dP/dQ for some P € C}. 
Then the (primal) problem in (1.1) can be expressed as 

(2.3) inf f(x). 

Since the primal space is taken as Li(Q), the dual space is given by Loo(Q) 
[22], which is the space of bounded functions. However, this space is too 
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restrictive, as shown by Bhattacharya and Dykstra [2] (see also Remark 3.2 
in Section 3). A more general dual space M(Sl,F), the set of extended valued 
^"-measurable functions on J7, is more useful in this context. So M(r2,JF) 
will be our dual space as well. 

The convex conjugate of /, denoted by f*(y), for y G M(J7,^ r ), is defined 

as 



(2.4) f*(y)= sup 

x£dom(f ) 



J xydQ- f{x) 



where dom(f) = {x G L\{Q) : f(x) < oo}. Bhattacharya and Dykstra [2] showed 
that, for the function / given by (2.1), the convex conjugate is given by 



(2-5) /*(y)=ln( 



e v dQ 



(Q need not be a PM). 

A subset K of a vector space is said to be a cone if x G K ax G K, V a > 
0. For an arbitrary subset S 1 of L\(Q), the conjugate (or duaZ) cone of S 1 is 
given by 

S® = ^y€~M(Sl,f): J xydQ>0, for all xGsJ. 

The dual problem to (2.3) is given by inf ygC e f*{y)- Theorems 2.1 and 
2.2 [2] below present a sufficient condition which can be used to identify the 
solutions of the I-projection problem and its dual, and express one solution in 
terms of the other. Also, from (2.4) it follows that f{x) + f*(y) > f xydQ > 
0, Vx G S, y G S®. Theorem 2.1 shows that if f{xo) + f*(yo) < for some 
xq G S, yo G 5®, then xq, yo solve the primal and dual optimization problems, 
respectively. 

Theorem 2.1. Assume S is a subset of L\{Q) which intersects dom(f) 
and T C S® . Then xq G S and yo G T are respective solutions of 

inf f(x)= inf / xlnxdQ and inf f*(u)=ln inf / e y dQ 
x£S xeSndom(f)J y£T J Ka ' [y&T J 

if 

(2.6) /(x ) + r(2/o)<0. 

Moreover, in this situation mi x ^s f{x) = — valy^T f*{y)- 

In the trivial case when T = {0}, it is easy to see that j/o = 0) f*(yo) = 0. 
Then (2.6) gives that f(xo) < 0. Since f(x) > Vx, we have /(xo) = 0. From 
Theorem 2.1, inf^gs /(x) = — inf^gT f*{y) = 0. Thus, we have Q G C. 



I-PROJECTIONS ON CONVEX SETS 



5 



In (2.5), substituting y(ui) = az(ui),a > 0,co € for a fixed z € M(Q, f), 
one may interpret f*{y) as the cumulant generating function of a random 
variable z (with PM Q) when /* is evaluated along the one-dimensional 
path y = otz. Thus, minimizing /* over the (one-dimensional) region K,® = 
\v:v(u) = az(iv),a > 0} is equivalent to minimizing I(P\Q) over the set 
K. = {x = dP/dQ £ L\(Q) : J zxdQ > 0} since K,® is the conjugate cone of K,. 
In other words, Theorem 2.1 shows that minimizing the cumulant generating 
function over nonnegative values of a is equivalent to finding the I-projection 
of Q onto the set of distributions over z{oS) values with a nonnegative mean. 

Theorem 2.2. Assume S is a subset of L\{Q) and yo is a solution to 
hifyes© / & y dQ < oo. Then xq = e yo / J e yo dQ is the solution to 



ifxo £ S. If S is either (1) convex, variation- closed and contained in dom(f) 
or (2) a variation-closed, convex cone, then xq € S, and, hence, xq must 
solve (2.7). 

Using Theorem 2.2, many exponential families of distributions are ob- 
tained as solutions to the I-projection problem when S contains appropriate 
moment constraints [2]. From now on, to use the result of Theorem 2.2, we 
will replace Cq in (2.2) and (2.3) by 



3. The algorithm and its convergence. We assume that Q is a given 
PM and there exists V sC such that I(V\Q) < oo. We wish to find the I- 
projection of Q onto C = f)l =1 d, where each d is a variation-closed, convex 
set of PM's. We first state the algorithm in the primal form, and then in the 
dual form. The main point is that successive iterative projections may not 
work for general convex sets (see Example 3.1 below). Hence, we propose 
some adjustments to be made before an I-projection is done. 

Primal formulation of the algorithm: 

• Initialization: Set Sq^ = Pq^ = Q and begin with n = 1, i = 1. 

• Implementation: 



(2.7) 




/Co = {ax : x £ Cq, a > 0}. 



1. Set 



(3.1) 




for i = 1, and 
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(using the conventions described in the Introduction), and find P n ^ = 
^i(S n ,i) [where %i(S n ^) is the I-projection of S n ^ onto Cj\. 
2. If i < t, increase i by 1 and go to step 1 above. If % = t, increment n by 
1, set i = 1 and go to step 1 above. 

In words, to implement the algorithm for the primal problem, at the first 
cycle we find the I-projection of Py-i onto C;, namely, Pi j for 1 < i < t 
(with P±fi = Q). At the nth cycle ith step (n > 2, 1 < i < t), we first form 
dS nt i/dQ by adjusting the current I-projection density dP n _i it /dQ for i = 1, 
dP nt i-i/dQ for i > 2. The adjustment amounts to taking out the effect of 
the last I-projection from the previous cycle, which is dP n -i^/dS n -i t i. Thus, 
although at the first cycle S\ t i = Q,S\^ = P\^-\,i > 2, are all PM's, for 
n > 2, due to the adjustment made, the measure S n i need not be a PM. 
Since the algorithm finds the I-projection from S n> i, we need S n> i to be a 
finite measure so that (1.2) may be used in the proof of convergence of the 
algorithm. Thus, the following assumption is made: 

Assumption A. sup n A J dS n ^ = Mi < oo. 

We show below (following Lemma 3.1) that if each Cj is a linear set, 
then no adjustment is necessary, Sn,i — Pn — l^i^n,? — Pn,i — ^ 2, Vtz are 
all PM's, and Assumption A clearly holds. In general, however, it is not 
easy to identify situations when Assumption A may be violated. One can 
monitor the value of M\ while executing the algorithm (see the discussion 
following Theorem 3.1). One faces similar difficulty in the discrete case [10]. 
In the remarks following Example 3.2, different choices of f2 and Cj's are 
considered to discuss Assumption A. 

We show in Lemma 3.1 that the I-projections P n .i and the corresponding 
densities dP nt i/ dS n: i defined by the algorithm exist Vra, i. In the discrete case, 
Dykstra [10] assumed that the individual I-projections exist. Later Winkler 
[28] proved such existence; however, his proof depends on the discreteness 
of the sample space. 

Lemma 3.1. We assume that Q is a given PM and there exists V £ C 
such that I(V\Q) < oo. Then the densities dP n ^/dS n ^ defined by the algo- 
rithm exist \/n,i. 

Proof. We prove this by induction. First, we show that, in the first 
cycle, all I-projections exist. By assumption, the projection of Q onto C±, 
that is, Pi,!, exists. Applying (1.2) with P = V G C\ (since V <E fliU^) and 
R = p 1>u we get I(V\P lt i) < I(V\Q) < oo. Since V G C 2 also, it follows that 
Pi 5 2 exists. Thus, continuing in this way, it follows that each of Pi 3, . . . , Pi,& 
exists. Now assume that all I-projections exist up to the nth cycle, (i — l)st 
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step. To show that P n ^ exists, it is enough to verify that I{V\S n ,i) < oo 
(since V S Cj). It follows from (3.1) that dS n> i/dQ = Y¥j=i(j^i){dPa,j/dS at j), 
where a = n if j < i, otherwise a = n— 1. Then 

t 

~ E HPaj\Sa,j) 
<I(V\Q)~ J2 ^aJSaj), 

since, by (1.2), every term in the middle sum is nonnegative (with V = 
P,R = P a ,j,Q = S a ,j) and using the induction hypothesis. Now replacing 

S n ,i by S' n i = S n>i / J dS n>i , we get 

I(V\S' n ,i)+ E I(P a JS' aJ )<I(V\Q)+j2^ I dS aJ 

</(y|Q)+tlnMi<oo 

by Assumption A. Thus, I(V\S' ni ) < oo, and consequently, I(V\S n> i) = 
I(V\S' n 4 ) — In / dS n: i < oo by Assumption A. Hence, P n ^ exists, I(P n ,i\S n ,i) < 
oo and the densities dP n ^j dS n ^ exist Vn,i. □ 

One may also observe the following facts concerning the I-projections ob- 
tained from the algorithm. Using (1.2), it follows that, for A C Cl,iTi(Q)(A) = 
would imply either Q(A) = or P(A) = for all PgQ [with I(P\Q) < oo}. 
By reasoning inductively, it follows that S n i(A) = would imply either 
Q(A) = or P(A) = for all P € d [with l\P\S n)i ) < oo]. It also follows 
that once an I-projection assigns mass (a.e. Q) to a set A in $7, all subse- 
quent I-projections also assign mass (a.e. Q) to the same set A. 

If the Cj's were actually linear sets so that equality holds in (1.2), our 
procedure would reduce to the successive iterative projections algorithm 
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and no adjustment is needed. To see this, let P G Cj. Then using (3.1), 

From (1.2), setting Q = S n -i,i, R = P n -i,i, it follows that 

In — dS n - h i = 0, 



dS n —i i dS n —i i ) \dS n 



-1A 



or J(dP/dQ — dP n -i : i/dQ)ln(dP n -i^/dS n -i j i) dQ = 0. Hence, the last line 
of (3.2) is equal to 



\dP n j-i/dQ J J dQ \dS n -i ti ; 

= I(P\P nj i_i) + I(P ri -i l i|jS ri _i ) i). 

Clearly, the P G Cj which minimizes /(P|5 nj j) is the same as the P which 
minimizes I{P\P n ^\). Thus, here we can take S n> i = P n> i-i, and, conse- 
quently, Assumption A is always true. 

The following simple example demonstrates that the successive iterative 
projections procedure does not work for general convex sets. 

Example 3.1. Suppose we like to find the I-projection of Q [= the uni- 
form distribution on (0, 1)] onto the class of all distributions on (0, 1) whose 
first two moments are at least 0.7. The constraint region can be expressed 
as Ci nC 2 , where C x = {P:E P {X) > 0.7}, C 2 = {P:E P (X 2 ) > 0.7}, where 
A is a random variable on (0,1). It can be easily seen that the succes- 
sive iterative projections algorithm produces the solution R with density 
r( x ) = e 2 - 672x+1 - 943x2 /17.120, < x < 1. However, the correct solution is R* 
with density r*(x) = e 3 - 932x2 /7.845, < x < 1. 

In the dual formulation, the algorithm is a cyclic descent algorithm which 
successively minimizes over one function at a time while the others are held 
fixed. The dual problem is equivalent to 

inf fe y dQ= inf / e y dQ 

ve<PLi lc *) e>J yeci(Kf+-+K?)J 



inf / e yi+ - +Vt dQ. 
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Typically the dual constraint region would be the direct sum (/C® H \-K,f). 

However, the direct sum of closed sets need not be closed [1], and hence 
our dual constraint region is the closure of the direct sum. At each step of 
the primal algorithm, a dual problem can be identified using Theorem 2.2. 
Following the duality approach, not only does the process seems simpler and 
more intuitive, but also, depending on the constraints, the dual problem may 
be more tractable than the I-projection problem. 
Dual formulation of the algorithm: 

• Initialization: Set yo,i = and begin with n = l,i = 1. 

• Implementation: 

1. Let y nt i denote the solution to 

inf / e' y "' lH H/n,i-l+l/+!/n-l,i+lH — h/n-1,* 

yeKfJ 

2. If i < t, increase i by 1 and go to step 1. If i = t, increment n by 1, set 
i = 1 and go to step 1. 

In the following we present five lemmas which are crucial in proving the 
main result of this paper (Theorem 3.1). Recall, for a closed convex cone /C, 
the dual problem can be stated as 

(3.3) inf In f e y dQ. 

To begin, in Lemma 3.2 we establish a necessary condition for the solution 
of (3.3). 

Lemma 3.2. Let K,® be the conjugate of a closed convex cone K,. If yo 
solves the dual problem in (3.3), then 



(3.4) 



J y e M dQ = 0, 
Jye yo dQ>0 VyG/C®. 



Proof. Since yo minimizes g(y) = In / e y dQ on the convex set /C® and 
g is Gateaux differentiable at yo, we get (d/da)g(ay + (1 — a)yo)U=o > 0. 
Applying this for the function g(y), we get (d/da)ln f e Qy+ ( 1_Q ) ?/0 dQ\ a= o > 
0, or 

r P yo 

(v-Vo)-r 77^dQ>0 VyG/C®. 

By choosing y = cyo first with c > 1 and then with c < 1 (since K,® is a cone) , 
we obtain the results given in (3.4). □ 
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Now we are able to relate the solutions from the primal and dual algo- 
rithms. 

Lemma 3.3. At the (n,i)th step, the solutions from the two algorithms 
are related by 

dP n>i 



(3.5) 



Also, S Uj i and P n ^ can be expressed as 



and I(P n ,i\S n> i) = -ln 



J e y ^ <IS„. 



(3.6) 



dS n j 
~dQ~ 

dPji,i 

~dQ~ 



Cn.iC 3 



E F 1 fcj+Et t +l fn-1,3 



n—l,j 



ELlJ/n,j+Ej=i+lfn-l, 



where c n ^ is given by 
(3.7) 



1 -1 



71-1 

><n 

k=l 



j e Sj = l *,3'+Ej=i+l Vk-1,3 d Q 



I* 



Ej=l f*,j'+X^7=i 



1 < i < t ( assuming E?=i Vn,j = Ej=t+i Vn,j — 0, yo,j = 0). In addition, 



(3.8) 



/ Vn,i dP n ,i = 0, JydP n4 >0 V y € Kf 



The I-divergences can be expressed in terms of dual solutions as 

f e Sj=i ffc,j+Z)j=i+i y^—i.j d Q 



I ( Pn.i\S> 



n.i \ t -"n,i l 



(3.9) 



Y^I{Pn,i\Sn,i) = -\n f 
i=l J 



In / : <!(). 



Proof. From Theorem 2.2, at the (n,i)th step of the primal algorithm, 
the solution to inf^g^ / xlnx dS n< i is given by dP n j/dS nj i = e Vn ' i / f e 27 "^ dS n ,i, 
where y n ^ solves the dual problem mf y&fC e, / e v dS n ^. Also, by Theorem 2.1 

we obtain I(P n ,i\ S n,i) = -ln/e^ dS n ,i- 

We prove (3.6) and (3.7) by induction. For n = 1, we have Sij = Pi^-i, 
and, consequently, 

dP 1A= dP lti _ e y^ 
dS 1>{ dP M _i jey^dP^ 
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Applying this equation recursively for i — 1, . . . , 1 and combining the result- 
ing equations, we obtain dP\^jdQ = e^i =1 ' j / J e^ =1 ,j dQ with = 
(j- e EJ =1 vw d gyi f or i < i < t. Thus, (3.6) and (3.7) hold for n = 1. Sup- 
pose (3.6) and (3.7) also hold up to step (n,i — 1) of the algorithm. To show 
these equations hold for the (n,i)th step, we start with the definition of 
dS n> i/dQ in (3.1) and then apply (3.5) for (n — 1, i) to obtain 

dS n i dP n i—\ f dP n —ii^ 



dQ dQ \dS n -i i 



E i_ 



=Ej=l f».J+E 3 -=i+l fn-l,j 



J- e E;=iW»j+E i=j »»- ^ 

x /" e ?/ ™- 1 ^c n _i ) ie^=i 2/ ™- 1,i+ ^i=i+i !/ ™- 2 ^ 

r . . f P Ej=i!'«-ij+E J =,+i!'"- 2 J v-^i-i . v-^t 

j e E;=i^.i+E- =i f»-i.^ (/ q 

where is defined in (3.7). Also, using the expression for dS n ^/dQ derived 
above, we get 

dPn,i dP n i dS n i 



dQ dS n i dQ 



oVn.i v-^vz — 1 V^* 

Xfci^i+Efci^-i, 



Jey^(dS n ,i/dQ)dQ 

e Ej-=iW».j+E*-=i+i 



J ,E, : JM.J+E^+i ,/Q 

which proves the desired results. 

In (3.4), setting Q = 5 n;i ,yo = Vn,i, it follows that J y n ,ie Vn ' i dS nii = 0, 
which, using (3.5), gives / y n ,idP n ,i = as in (3.8). The rest of (3.8) follows 
from the fact that dP n ^/dQ € /Q and y € K.f. 

To derive (3.9), note that 

I(P n ,i\Sn,i) = -ln J e yn > i dS nii = -\n(c n>i J X, i*^+E* ^ dQ^j. 
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The expression for I(P n ,i\S n ,i), in (3.9), is obtained using the value of c^j from 

(3.7). Also, writing a K% = J e^U vh-u ^ weget £* =1 /(p^S^ 

Ej = iELi- ln ( fl fe,i/ tt fc,«-i) = -lna„ it (assuming a fcj0 = a fe _i,t, ao,i = !)• This 
proves the lemma. □ 

The next lemma essentially proves that the successive I-divergences are 
nondecr easing in nature. 

Lemma 3.4. Under Assumption A: 

(i) I(Pn,i\S n ,i) — I(Pn-l,i\S n -l,i) > I(Pn,i\Pn,i-l)- 

(ii) I(P n ,i\S n ,i) is nondecreasing in n, for every i. 
Proof. To prove (i), we may write 

P\Pn,i\S n ,i) ~ P\Pn—l,i\Sn—X,i) 

l( dP n>i - [ lnf dP n ^ 

A dP n ; - ( iJ dP n<i - flJ ^ZIA dp 



dQ ) J \ dQ ) ' J \dS n -n/ 

\dS J n_1 ' i 

^ PyPn,i\Pn,i— l)i 

since the second term is nonnegative by using the characterization of I- 
projection in (1.2) [with P = P n ,i,R = P n -i,i and Q = S n -i,i in the first 
line of (1.2), we can write f\a.{dP n ^/dS n -i^){dP n ^/dQ)dQ> fln(dP n> i/ 
dP n _i t i)(dP nt i/dQ) dQ + f]n.{dP n -i t i/dS n -i t i)(dP n -i ti /dQ)dQ, and the re- 
sult follows by moving all terms to the left of the inequality and algebra] 
and using Assumption A. The case when i = 1 follows similarly. 
Proof of (ii) follows from (i) since I(P nj j|P n .j_i) > 0. □ 

The next lemma shows that the I-divergences obtained from the algorithm 
are also uniformly bounded above. 

Lemma 3.5. Under Assumption A, I(P n ,i\S n ,i) are bounded above uni- 
formly in n,i. 
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Proof. By assumption, 3V G C such that I(V\Q) < oo. Writing 
dP n ^/dQ = (dP nt i/ dS n ^)(dSn t i/ dQ) and using (3.1), it follows that dP n ^/dQ = 
Ylj =1 (dP a j I 'dS a j) , where a = n if j <i, and otherwise a = n — 1. Then 

(3.10) 

-^>-g/(£-^)K^)« 

<i(y\Q)-^2i(p a j\s atj ), 

since by (1.2) [with P = V, R = P a j and Q = S a j in (1.2) and decompos- 
ing (1.2) as shown in the proof of Lemma 3.4] every term in the middle sum 
is nonnegative under Assumption A. Thus, 

t 

(3.H) i{v\p n ,i) +Y, i ( p *j\ s °j) ^ i{ y\Q) < °°- 

3=1 

Thus, I(P n ,i\Sn,i) are bounded above uniformly in n,i. □ 

Thus, Lemmas 3.4 and 3.5 together imply that 7(P nj j|P n) j_i) — > as n — > 
oo. The following assumption is needed in the proofs of Lemma 3.6 and 
Theorem 3.1. 

Assumption B. sup n i /(ln^i)dP nii = M 2 < oo. 

Since I{P n ,i\S n ,i) = I(Pn,i\Q) — J^(dS n ,i/dQ) dP n ,i, Assumption B would 
hold if one could show I(P n ^\Q) are uniformly bounded above. In the case of 
two linear set constraints of fixed marginals, Riischendorf ([25], Lemma 4.4) 
shows this to be true. But for general convex sets, this does not follow easily 
and it is also difficult to identify situations when M 2 = oo may occur. Of 
course, one can monitor the value of M 2 along with M\ while executing the 
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algorithm (see the discussion following Theorem 3.1). In the Remarks 3.1-3.7 
we discuss Assumption B in several cases involving Q and C%. 

The next lemma establishes the uniform integrability of the sequence of 

functions e^ =1 Va ' j (where a = n if j <i, otherwise a = n — 1) obtained from 
the dual algorithm. 

Lemma 3.6. Under Assumption B, the sequence of functions e^= 1?/aj 
is uniformly integrable where a = n if j <i, and otherwise a = n — 1. 

PROOF. From (3.9) we have 

I{Pn,i\S n ,i) ~ I [P-n— l,i\&n—\,i) 

= -l n y eE^iWnJ+E^i+iWn-u dQ + ln J e^=i fe ' J+ ^=> dQ, 
which is nonnegative by Lemma 3.4(i), or 

(3.12) J e D;=iw»j+Ej= i+ iw»-ij < y e E;=iw»j+Ej=i«»-ij ^q. 

Thus, using (3.12), each of the terms in square brackets under the J~[ s i§ n 
in (3.7) is less than (or equal to) 1 for k = 1, . . . , n — 1. Since 

= eX P ^ ^(•Pn.il'Sn.i)! 

by (3.12) and (3.9), the leading term of c n ^ in (3.7) is also uniformly bounded 
above from Lemma 3.5. Thus, we conclude 

(3.13) supc nj j < oo. 

n, i 

Using (3.6) and (3.8), we can write 

y (yn, i + m^-lnc n , i )dP n , i )(| X> dQ 



ln^-lnc nil / "" 



I-PROJECTIONS ON CONVEX SETS 



15 



Now for the continuous convex function 4>{x) = xlnx, we have 
lim x ._ >00 (i^(3;)/a;) = oo. Then using a modified form of the criteria of Valle 
Poussain [21], it is enough to establish that 



sup / 0(e^= l2/o ' J )dQ < oo. 

n.i J 



However, 



sup / c/)(e^J=i ya - ] )dQ 

= sup f X llM <*Q 

J j=l 

= ^(K iJ -w- i ^') dp "'I j:u "" JdQ ) 

< ^SUp J {^ n -^^j dP n,i + SUP In C n ^j 

x sup( / e ^i=i ya < j dQ 
n.i \J J 



which is finite by Assumption B, (3.13) and the fact that J e^ =1 ,3 dQ is 
nonincreasing in i (and n) from (3.12). Hence, the result follows. □ 

The main focus is on the following theorem, which proves convergence of 
the I-projection solutions obtained from the algorithm described above by 
using the connection between the primal and dual solutions at every step. 
Since we are considering in (2.3) the infimum of a convex function over a 
closed convex region, the solution exists and is unique (see also [6]). 

Theorem 3.1. We assume that Q is a given PM and there exists V € C 
such that I(V\Q) < oo. We also assume that the sequence of primal solutions 
dP n ^/dQ £ L\{Q), Assumptions A and B hold and the corresponding dual 
solutions y n ^ £ M(VL,!F) and y n i are uniformly integrable. Then as n — ► oo, 
we have the following: 

1. There exist (unique) xo E L\{Q), yo £ M(0, !F) such that f x^dQ is a 
PM, dP n ,i/dQ —> xo; Xq, yo solve the primal and dual problems, respec- 
tively, and xo = e yo / J e yo dQ. 

2. Moreover, if x = dP*/dQ (P* £ C), then I(P*\P n>i ) -> Vt. Also, 

dPnA dP* 



IP . _ p* 



dQ^O Vi, 



dQ dQ 

where \\P n i — P*\\ is the total variation distance between P n i and P* . 
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3- Ei=lHPn,i\Sn,i)^I(P*\Q)- 

Proof. 1. Using Lemmas 3.4 and 3.5, I(P n ,i\P n ,i-i) — ► as n — > oo, 
hence, by using the well-known relation [5, 6] 



(3.14) 



\P-R\\ < (2I(P\R)) 



1/2 



for any two PM's P and R, it follows that ||-P n ,i — Pn,i-i [ | — >■ for all 2 < i < t, 
that is, 



(3.15) 

Using (3.8), we have 



dPn,i dP n ^i—\ 



dQ dQ 



dQ^O. 



(3.16) 



/ dPn,t AP n ,i 



dQ 



E fy^i E 

i=l J U=i+1 



(dPn,s dP ns —\ 
~d~Q dQ 



dQ. 



From (3.15), \dP n}t /dQ-dP n>i /dQ\ (a.e. Q) and it follows that dP n j/dQ € 
p|* = i A^i (a.e. Q) for sufficiently large n. Since X^=i yn,« S 0* = i Kf , we get 

The (a.e. Q) boundedness of \dP Tl) t/ dQ — dP n ^/ dQ\ and uniform integrability 
of ynA imply from (3.16) that 

limsup /( ]^^d(5 = limsup [ ( J2y n ,i ) (0) dQ = 0. 

n.->oo J y i=1 y Cltt/ ra->oo J \ i= i J 



Hence, 



0. 



Adding and subtracting In / e^-'i=i 2/n ' i <iQ to the left-hand side of (3.16), we 
get 



(3.17) 



lim / 



dP n ,i 

dQ 



It follows from Lemma 3.6 that e^i=\ Vn ^ is uniformly integrable. Thus, 
eSi=i2 / ™. i is tight, and hence, relatively compact ([4], pages 35-41). So, given 
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any sequence of increasing positive numbers, there is a subsequence, say, 
{rij}, and an element xo € M(Q, J-) such that 

(3.18) e^=i^' l ^x 
[1, 22]. This means 

(3.19) J e^=i Vn r l xdQ-> J x xdQ VxeLi(Q) 

as j — > oo. Also, x £ Loo(Q) 4i£ L\(Q). Using the convergence result 

in (3.19) only for x € M(r2,^"), we get the weak convergence of e^= lS/n J' 1 
in L\(Q). Thus 

(3.20) e ELiW»*.< Ax,, inLi(Q). 
Using the constant function 1 E M(f2,.F), we obtain 

(3.21) / e £!=i«S.*dQ_» / xqcLQ 



as j — > oo. Prom Jensen's inequality and uniform integrability of y n ,i, it 
follows that 

/" e ELi^.^Q> e /Ei=i^.* <i Q>o. 

Hence, from (3.21), f xodQ > 0. 

Combining (3.20) and (3.21), we obtain 

(3.22) ^ eELl ^" «■ *° 



d Q j e E i= i w»j ,* rfQ Jx dQ' 
Since / is lower semicontinuous, we have 
(3.23) liminf/(^^)>/( x ° 



Since / is convex and /C is a closed convex cone, the solution to inf xg ^ f(x) 
exists uniquely. From (2.6), if we can show for some xq € K.,yo S /C®, /(xo/ 
J xodQ) + f*(yo) < 0, then xo/ / x^dQ,y^ are (unique) solutions to the 
primal and dual problems, respectively. We show below that xq/ J XodQ 
in (3.22) can be used for this purpose. So if the sequence dP n j/dQ has many 
limit points, they must be equal (a.e. Q). Let yo = lnxo- Then using (3.17), 
(3.21) and (3.23), one gets 
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■dP n 



<liminf/( — + lim In f e ^*^ Vn i' i dQ 
rij- *oo \ dQ J n ^°° J 

(3.24) 



;dP, 

< lira inf < / 



dQ 



+r(l>,)} 



= 0. 



To complete the proof, we need to show that xq G (\=l ^ an< ^ 2/0 € 

(nti^) ffi - 

Using (3.6) and (3.8) for i = t, we get / ydP nj j > for y G /C®, or 

(3.25) J e^^ yn ^'ydQ>0 VyG/Cf. 
Also, by (3.20) 

(3.26) J e^i yn J J ydQ^ J x ydQ VyG/C® 

as j — » oo. Combining (3.25) and (3.26), we obtain / x ydQ > 0, Vy G /C®, 
or equivalently, xo G /Q . 

For any 1 < i < f — 1, with y G /C®, by using (3.8), we can say 



t-i 



Since 

limsup ( V^{ d , n / ) '°' ~ d n J^. +1 \dQ = lim sup fy(0)dQ = 
and by using (3.22), 

~ I y dQ ® J^JxodQ ® 

(as j — ► oo) for all y G /C®. Hence, ceo G /Q, 1 < i < t — 1. Hence, xq G fli=i ^i- 
Now we show that yo G (ff i=1 )Ci)® . From (3.18), it follows that for the 
subsequence {rij} there exists y such that 

(3-27) Y.yn^y. 

i=i 

By lower semicontinuity of /*, 

(3.28) limjnfrf^y^J >/*(y). 
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Using (3.28) and arguments similar to (3.24), it follows that 

For x G n'=i^Q> it follows that /(X)i=i Vn^ijxdQ > since Td^iVn^i € 
(n-=i^)®- By (3.27), /xy > 0. Hence, y G (flLi**)®, and x /Jx dQ, 
y solve the primal, dual problems, respectively. Since we have already es- 
tablished that xq/ j xodQ,yo solve the primal, dual problems, respectively, 
we must have yo = y (a.e. Q). 

2. For P* G C, if dP*/dQ = x (a.e. Q), then 

(as n — > oo) by continuity of the In function. By (3.14), it follows that P n ^ 
converges to P* in total variation. 

3. From (3.10), we have 

mP,,) = i(v\Q)-±J{% - ^) *-p<TM 

Setting V = P*, since dP aJ /dQ ^ dP*/dQ and I(P*\P n ^ -> as n -»■ oo, 
we get the desired result. □ 

As mentioned earlier, we recommend that one can monitor the values of 
Mi and M<i while executing the algorithm. This can be done by inserting an 
extra step of computation while writing a computer program for the algo- 
rithm. If the algorithm is not going to converge correctly, then M\ and/or 
M2 should become excessively large. Otherwise the algorithm must converge 
to the correct solution. We find it extremely difficult to construct examples 
where either M\ or M2 might be infinite. Thus, it seems that such cases 
will only be possible under quite unusual circumstances (although we can- 
not prove or disprove this). We discuss the Assumptions A and B following 
Example 3.2 for different choices of and Cj's in Remarks 3.1-3.4. 

If Un,i were uniformly bounded [which would be true if we used L OQ (Q) 
as the dual space instead of M(Q, J-)}, then the uniform integrability con- 
dition of y n< i is easily satisfied. Although the assumption that the y n /s are 
uniformly integrable instead of uniformly bounded makes the proof of The- 
orem 3.1 rather complicated, this covers many situations which would not 
be possible otherwise, as the next example shows. 
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Example 3.2. Suppose we wish to find the I-projection of the PM 
Q with density (with respect to Lebesgue measure on the unit square) 
given by q(x, y) = (4/5)(l + xy), < x, y < 1, onto C\ n C 2 , where C\ = {P : 
E P (\nX) > -0.5}, C 2 = {P:E P {X + Y)> 1.3}. Writing C x = {P:/(0.5 + 
\ux){dP/dQ)dQ > 0} and C 2 = {P : f(x + y - 1.3) (dP/dQ) dQ > 0}, 
it follows that the conjugate cones are given by Cf = {y(u,v) :y(u,v) = 
a(0.5 + In it), a > Q},Cf = {y(u,v):y{u,v) = (3(u + v - 1.3), (3 > 0}. While 
checking the assumptions, we note that the y n ,i(S C®)'s are unbounded, 
but uniformly integrable and the y ra ,2(€ C®)'s are uniformly bounded (so 
uniformly integrable). After six cycles we obtain / diS^i = 0.8598, / dSe,2 = 
0.9049 and / \n(dS 6A /dQ) dP 6 ,i = 2.81 x 10" 4 , / ln(dS 6}2 /dQ) dP 6j2 = -3.25 x 
10~ 4 [these integrals settle around these values for slightly higher (n, i)'s 
also]. Thus, we assume that Assumptions A and B hold. Applying the algo- 
rithm, after six cycles we obtain Pq^i where dP^p/dX = e 1 - 0394 ( u + t ') n - 3757 (i-|- 
«u)/3.3451. Noting ^p B2 (lnX) = -0.4992 and Ep 6 2 (X + Y) = 1.300 and as- 
suming that convergence is essentially obtained, one may take Pq^ as a good 
approximation for P* , the solution to the I-projection problem. In most cases 
(including this example) numerical integration techniques are needed for the 
above calculations. 

It may be noted that many of the maximum entropy characterizations of 
families of distributions (e.g., [17]) can be obtained following our procedure. 
In Example 3.1, the solution R* have maximum entropy in the class of all 
distributions on (0, 1) which have the first two moments at least 0.7. 

Remark 3.1. If the space ft were discrete and finite, and the Cj's were 
linear, variation-closed sets, then our procedure would reduce to that of 
Csiszar [6] with the interpretation that s nt i(k) = p n -i,t(k), s nt i(k) =p n ,i-i(k), 
n>2. Replacing the integrals by sums, here Assumption A holds easily since 

J2k s n,i( k ) = 1, Vn,i. Also, Efc m (Sn,i(&)M&))Pn,i(fc) = Efc( ln S n ,i{ k ))Pn,i{ k ) + 

Ek(-^Q(k))xPn,i( k ) <J2k(- ln Q( k ))Pn,i( k ) <Efe(-W*0)<°°;thus As- 
sumption B holds for q{k) > (note q(k) = implies s n: i{k) = p n ,i( k ) = 0, 
Vn, i with 0/0 = 0; see the discussion following Lemma 3.1 and [3]). The 
pointwise convergence p n ,i(k) -^p*{k) Vk is attained without assuming uni- 
form integrability of the y n i, as can be seen by mimicking the arguments of 
Bhattacharya and Dykstra [3] when applied for the linear sets. 

Remark 3.2. If the space £1 were discrete and finite, and the Cj's 
were convex, variation-closed sets, our procedure would reduce to that of 
Bhattacharya and Dykstra [3] (also [10]). Here the Sn/s are only posi- 
tive measures, and hence one must make Assumption A (noted by Dyk- 
stra [10] also). However, Assumption A implies that s nj j(fc) <M\ Vfc [since 
s n>i (k) > 0,Vn,i]. Then J2k fo(s n>i (k) / q(k))p n>i (k) = J2k ins n,i( k )Pn,i( k ) + 
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Efc(-Wfc))p nii (fc) < lnM i + Ek(-M*0) < oo [for q(k) > 0, Vfc]; thus, 
Assumption B holds. One need not make the assumption of uniform inte- 
grability of y n j as shown by Bhattacharya and Dykstra [3]. 

Remark 3.3. If Q were discrete but with infinite support (so Bhat- 
tacharya and Dykstra [3] does not apply), then Assumption A holds when 
the Cj's are linear and must be assumed when the Cj's are convex. However, 
in general, it seems intractable to show that M2 < 00. One of the possible 
reasons is that q(k) could be positive but infinitely small for many k. Con- 
sequently, Assumption B has to be made. Also, uniform integrability of y n ^ 
has to be assumed. 

Remark 3.4. If the space VL were infinite-dimensional and the Cj's were 
linear, variation-closed sets, our procedure would generalize to the infinite- 
dimensional case of Csiszar [6] (which remains unsolved still now), with the 
understanding that S n> \ = P n -i,t,Sn,i = Pn,i—l^ > 2,Vn (see the derivation 
following Lemma 3.1). In this case, Assumption A holds easily since J dS nt i = 
l,Wn,i. For checking Assumption B, first note that, for % > 2, f]n(dP n j-i/ 
dQ)dP n! i = -I(P n ,i\P n ,i-i) + I(Pn,i\Q) (similarly for i = 1). Although 
I(P n i[Pn,i— l) as ?w 00, in general it seems to be intractable to show 
that sup n j I{P n ^\Q) < 00. But using particular definitions of the sets Cj, 
this may be possible; for example, Riischendorf [25] imposes restrictions on 
marginal densities (see his conditions B2, B3 and Lemma 4.4) and verifies 
that sup n j I(P n> i\Q) < 00 for IPFP. In general, the uniform integrability of 
y n i also has to be assumed in this case. For IPFP, Riischendorf [25] also 
made this assumption, but shows that this follows from his conditions B2, 
B3 on marginal densities (see his Proposition 3.2). 

Remark 3.5. The assumption that there exists some V EC such that 
I(V\Q) < 00, is the same as the assumption of Csiszar ([6] Theorem 2.1) 
that S(Q, oo)nC^0, where S(Q, p) = {P 1 : 1{P\Q) < p}, < p < 00, is the 
I-sphere with center Q and radius p. Without this assumption, one may 
not be able to find an individual I-projection [(1.1) and afterward] P n ^, the 
terms dS n: i/dQ or dP n> i/dS n> i may not be defined or be arbitrary, M\ and/or 
Mi may become excessively large, the algorithm may not converge, or any 
combination of these may occur. Also, if the assumption is not valid, then 
the final result (if found) will not correspond to an I-projection by (1.1). 

Remark 3.6. It is one of the difficulties of the iterative procedures in 
optimization methods to specify the distance of the current estimate from 
the actual solution since small changes in the objective function between 
successive steps of the iterative procedure do not guarantee that the ac- 
tual optimal solution is near by. However, from Theorem 3.1, one can use 
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the I-projection P n ^ and the quantity Ya=i P\Pn,i\£>n,i) to estimate P* and 
I(P*\Q), respectively, to arbitrary accuracy by choosing n sufficiently large. 
Yet another bound can be obtained in the following way. Let P £ n*=i Ci be 
close (in I-divergence sense) to P n ,t, and P* be the actual solution to our 
problem. We also assume that P is known a priori, or may be obtained by 
least square projection onto a subset of Di=i Q 01 by some other method. 
Since by substituting V = P* in (3.11), we get I{P*\Q) > Td =1 I(P n! i\S nii ) 
for sufficiently large n, it follows that 



Consequently, using the leftmost term, we can specify an upper bound on 
both the I-divergence distance and the variation distance between the PM 
P and the true solution P*. For instance, in Example 3.2 if one chooses the 
PM P, where dP/dX = C e a ^ +lnu ^ + ^ u+v - 1 ^ {A/h)\l + uv) (c = normalizing 
constant) with a = 0.4, (3 = 1.04, then it is easily verified that P is in C\ (IC2, 
and I(P\Q) - Ef=i I(Pe,i\Se,i) = 0.0064. 

Remark 3.7. The rate of convergence of the algorithm depends largely 
on the nature of the Cj's. If all the Cj's are orthogonal with each other (in 
terms of I-divergence), a single pass through each constraint will suffice. On 
the other hand, if one constraint has a narrow angle (in terms of I-divergence) 
with another, then many cycles will be needed to achieve a desired level of 
convergence (see [13]). 

4. Marginal stochastic orders. Since the introduction of the IPFP by 

Deming and Stephan [9], it has been widely used in many different fields. 
For a given bivariate density function, Kullback [20] considered the problem 
of matching (approximating) its marginal density functions to given univari- 
ate densities using an iterative algorithm (similar to IPFP in the discrete 
case). Recently, Riischendorf [25] proved the convergence of this iterative 
algorithm under some regularity conditions. The purpose of this section is 
to view the problem considered by Riischendorf [25] from a duality perspec- 
tive, and to extend it to the case of (separate) row and column marginal 
stochastic orders. The process generalizes naturally to higher dimensions 
with restrictions involving possibly more than one marginal at a time. 

Let Q be a fixed, bivariate PM on 1Z 2 with X-marginal Qx and Y- 
marginal Qy . Also, let Gx and Gy be given univariate PM's on 1Z which 
are absolutely continuous with respect to Qx and Qy, respectively. We 



I{P\Q) -^2l(P n ,i\Sn,i) > I(P\Q) ~ I(P*\Q) 



i=l 
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consider the problem of finding the I-projection of Q onto S(Gx , Gy), 
the class of all bivariate PM's whose X-marginal (Y-marginal) is stochas- 
tically greater than or equal to Gx (Gy)- Thus, S(Gx,Gy) = Si n S2, 
where Si = {P :J(I Ax ~ G x (x)){dP / dQ) dQ < 0, V x £K}, S 2 = {P :J{I By ~ 
G Y {y))(dP/dQ)dQ < 0, My € 11} with I Ax = {(u,v):-oc < u < x,-oo < 
v < 00}, I By = {(u,v) : -00 < u < 00, -00 < v < y}, G x (x) = Pr Gx (X < x), 
G Y (y) = Pr GY (Y<y). 

Scrutinizing the definition of the conjugate cone, we see that the corre- 
sponding dual problem can be expressed as inf^e^© / e y dQ, where y = 

y(u,v) = y 1 (u) + y 2 (v)eSf > ®S^ and 

<S® = ly(u,v):y(u,v) = y\(u),y\(u) is nondecreasing, J y\(u) dG x{u) = 

y(u,v):y(u,v) = y2(v),y2(v) is nondecreasing, J y2(v) dGy(v) = 

Let Eq(g\Z) denote the least square projection (or isotonic regression) of 
g onto I, the cone of all nondecreasing functions on C 2 (Q) ([24], Chapter 8) 
with weights Q. Following the algorithm of Section 3, we first find the I- 
projection of Q onto <Si. Defining 



y hl (u,v) = ln E, 



(dG x 



\dQx 

-/" l Hl^l I )) (^,<i0 * (t, • 

it can be seen that 2/1,1 (it, v)(= 2/1,1(1*)) € «S®, and for any y(u, v)(= y(u)) € Sf 
we have (d/da) J e w.i+«(v-wi.i) dQ\ a=0 = J(y(t) - y hl (t))dG x (t) = 0, for 
< a < 1, which implies that y\ \ solves the dual problem. From Theo- 
rem 2.2, Eq(dGx /dQx \I) = a* (say) solves the primal I-projection problem. 
If (dQ/d(Qx <&Qy))(u, v) = h(u, v) and P* i is the I-projection at the nth cy- 
cle onto Si , then we can express dP* 1 / d(Qx <8> Qy) = (dP\ 1 / dQ) (dQ / d(Qx <8> 
Qy)) = a\h. Next we find the I-projection of P* x onto £2- Following the 
last projection, it is given by dP^/dP*^ = E p *Y(dG Y / 'dP*J\2) = b\ (say), 
where Pf{ is the y-marginal of Pf 1 . Hence, we can write 



dPi, 2 dP^ 2 dP?, dQ 



d{Qx®Qv) dPTi dQ d{Qx®Qy) 



(dG Y T \ F (dGx \ dQ 
Ep H [dPtJ 1 )^^ 1 ) d(Q x <S> Qy) 

blalh. 
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To begin cycle 2, we first form W2 ) i{A) = j A bl(v) dQ, A C 1Z 2 . The I-projection, 
P 2 *n of onto Si is given by (dP^/diQx ® Qy)) = (dP 2 *i/ 

dM / 2,i)(dW2,i/rfQ)(^Q/d(Qx <8> Qy)) = et^/i. Thus, in general, after the 
nth cycle, dP* 1 /d(Q x ® Qy) = a^K^h, dP* 2 /d(Q x ® Qy) = a* n b* n h, where 
<(«) = £ p *x (dG x /dP** )2 |Z)(u), &*(«) = £? p .v(dGy/dP*Y By The- 

n — 1,2 n,l 1 

orem 3.1, there exist a*,b* such that a* 6* — » a*6* (in variation), and we get 
dP* /d(Q x ®Qy)(u,v) = a*(u)b*(v)h(u,v). 

To verify Assumption A, we note that sup n j / dW n ^ < oo simplifies to 
sup n {J a* n dQx , J bndQy} < oo. Since 

[ ln(dS n ,i/dQ) dP n ,i = /(ln6*_ 1 )a*6*_ 1 dQ 

J &n-i m& n-i d QyJ (Y a* dQx 



\n{dS n ,2/dQ)dP n ,2 = / (In a* )a* 6* ciQ 

y a* In a* dQx^j(^J 6*dQyJ, 

for Assumption B it is enough to assume that sup n {J a* In a* dQx , / &n x 
In 6* <iQy} < oo along with Assumption A. One also has to assume that 
In a* and In 6* are uniformly integrable (a.e. Q). The stochastic ordering 
problem considered above would reduce to that of Riischendorf [25] and 
Rtischendorf and Thomsen [26] if there were equality in the definitions of 
S± and <S 2 - The nondecreasing restrictions in the conjugate cones would be 
replaced by equalities, and in the algorithm, W^i, W n>2 taken to be the same 
as Pn—\,2i Pn,ii respectively, Vn > 2. Here, the above assumptions follow 
easily from the conditions imposed by Riischendorf [25] on marginal densities 
(using a* = a n , b* n = b n and using 2.7 and 2.8 of [25]). 

A modified set of Schrodinger (in)equalities in this situation can be ex- 
pressed as 

x roc rx 

\ a*(u)b*(v)h(u,v)dvdu> / gx{u)du Vx £ 1Z, 
-00 J — 00 J — 00 

00 ry ry 

/ a*(u)b* (v)h(u,v) dv du > / gY{v)dv \/y£lZ, 
-00^—00 J— 00 

where gx,9Y are fixed marginal densities. However, we note that they are 
not easy to solve for a*,b* , and the method described in this section would 
be needed to obtain the solution. 

The above process can be extended to more than two variables, and the 
restrictions may involve more than one marginal at a time. The functions 
a* and b* will not be functions of u and v alone in this case. Theorem 3.1 
yields the following. 
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Theorem 4.1. Let Q be a p-variate PM. Let nij C {1, .. . ,p}, 1 < i < 
k, be subsets of indices. Let Q mi be the marginal PM of Q for the in- 
dices in mi, and Si be the closed convex cone that Q mi is stochastically 
greater than or equal to G mi ( fixed). The I-projection P* of Q onto S = 
f\™ =1 Si uniquely exists, and there exist such that (dP* / dQ)(u\, . . . ,u p ) = 
a i( u m 1 )a>2(um 2 ) ■ ■ ■ a k( u m k ), where u mi is a vector with coordinates from 
u± , . . . , u p for indices in . 

As final comments, in this paper we have investigated the theoretical as- 
pects of the proposed algorithm along with some simple examples to demon- 
strate its use. The cases of infinite-dimensional IPFP and marginal stochas- 
tic orders are illustrated from a duality perspective. The algorithm assumes 
that we are able to find the I-projection onto the individual sets Cj. Simple 
cases, such as the ones in Examples 3.1 and 3.2, can be solved by Maple, 
Mathematica or IMSL routines. In other cases the level of difficulty will de- 
pend on our ability to compute the I-projection on the Cj's. Of course, with 
the invention of fast computing techniques, many such difficult tasks are 
well within reach. Thus, more research regarding computing may be needed 
to implement the algorithm in those cases. 
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